Distributed Training
نویسندگان
چکیده
Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections. In this paper, we find 99.9% of the gradient exchange in distributed SGD are redundant, and propose Deep Gradient Compression (DGC) to greatly reduce the communication bandwidth. To preserve accuracy during this compression, DGC employs four methods: momentum correction, local gradient clipping, momentum factor masking, and warm-up training. We have applied Deep Gradient Compression to image classification, speech recognition, and language modeling with multiple datasets including Cifar10, ImageNet, Penn Treebank, and Librispeech Corpus. On these scenarios, Deep Gradient Compression achieves a gradient compression ratio from 270× to 600× without losing accuracy, cutting the gradient size of ResNet-50 from 97MB to 0.35MB, and for DeepSpeech from 488MB to 0.74MB. Deep gradient compression enables large-scale distributed training on inexpensive commodity 1Gbps Ethernet and facilitates distributed training on mobile.
منابع مشابه
Distributed Nonlinear Robust Control for Power Flow in Islanded Microgrids
In this paper, a robust local controller has been designed to balance the power for distributed energy resources (DERs) in an islanded microgrid. Three different DER types are considered in this study; photovoltaic systems, battery energy storage systems, and synchronous generators. Since DER dynamics are nonlinear and uncertain, which may destabilize the power system or decrease the performanc...
متن کاملA New Framework for Distributed Multivariate Feature Selection
Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...
متن کاملEffect of Distributed Energy Resources in Energy Hubs on Load and Loss Factors of Energy Distribution Networks
In this paper, an attempt has been made to introduce a new control strategy including Plug-in Hybrid Electric Vehicle (PHEV) and Diesel engine generator to control the voltage and frequency of autonomous microgrids. The proposed control strategy has multiple advantages over the recent control methods in microgrids. The proposed method applies the primary and secondary frequency control strategy...
متن کاملModeling and Distributed Simulation Techniques for Synthetic Training Environments
Synthetic Training Environments (STE) can be used to teach people to function within complex systems without the real-world limitations of safety, cost, training areas, or personnel. A cost-effective technique for creating these environments is to integrate a distributed system of computer simulations, virtual environments, and live participants. This chapter characterizes the distributed simul...
متن کاملSymposium on Distributed Simulation for Military Training of Teams/groups the Engineering of a Training Network
This presentation describes the architecture and engineering development of the Multi-Service Distributed Training Testbed (MDT2). It summarizes the basic principles underlying Distributed Interactive Simulation (DIS) and the major components ofMDT2. It then discusses the problem of simulator interoperability and describes some ofthe interoperability problems encountered in developing MDT2. MDT...
متن کاملEvaluating a range of learning schedules: hybrid training schedules may be as good as or better than distributed practice for some tasks.
UNLABELLED We investigated theoretically and empirically a range of training schedules on tasks with three knowledge types: declarative, procedural, and perceptual-motor. We predicted performance for 6435 potential eight-block training schedules with ACT-R's declarative memory equations. Hybrid training schedules (schedules consisting of distributed and massed practice) were predicted to produc...
متن کامل